Let’s install needed packages.
install.packages(c("rgdal", "rgeos", "dplyr",
"raster", "tabularaster",
"ggplot2", "ggrepel",
"digest", "maptools", "readr",
"sf", "eurostat", "WDI"))library(rgdal)
library(rgeos)
library(dplyr)
library(raster)
library(tabularaster)
library(ggplot2)
library(ggrepel)
library(digest)
library(maptools)
library(readr)
library(sf)
library(eurostat)
library(WDI)The first detailed example will be explained for European countries.
Similar analyses for European regions, US states and world countries will follow.
In this tutorial I used DMSP-OLS data for 2013 zipped in this archive – a specific file is called F182013.v4c_web.stable_lights.avg_vis.tif (it’s size - 692 MB). Due to GitHub file size limitations you have to download this file manually and store in the local files folder.
Raster data is commonly used to represent spatially continuous phenomena. A raster divides the world into a grid of equally sized rectangles (referred to as cells or, in the context of satellite remote sensing, pixels) that all have one or more values (or missing values) for the variables of interest.
The raster() function is used to import data in the raster package, if the file has only one layer (i.e. for each cell / pixel there is only one value stored in it) or the brick() function, if there are more layers.
Nighttime light data contains only one layer
I will limit the raster data only to the EU area. To do this, let’s also load shapefile file with a map of EU countries (obtained from Eurostat - a particular file used is UTS_RG_10M_2016_4326_LEVL_0.shp.zip, NUTS0 is the national level).
## Reading layer `NUTS_RG_10M_2013_4326_LEVL_0' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 35 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
Let’s limit the map to the mainland (we cut all the islands). For spatial limitation of sf objects one can use a function st_crop(), in which the second argument is the object defining the bounding box.
## xmin ymin xmax ymax
## -63.08826 -21.38731 55.83663 71.17354
# define own bounding box
my_bbox = c(xmin = -11,
ymin = 30,
xmax = 55.83616,
ymax = 71.15304)
# limit the map
map_nuts0 <- st_crop(map_nuts0,
my_bbox)## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
Lets see the map again.
One can also use ggplot2 for mapping sf objects.
Presenting the values of the selected variable requires mapping its name on the fill aestetics.
To be able to impose a map on raster data, one has to make sure that THE SAME geographical coordinate projection is used in BOTH datasets.
Let’s check what are the current values of these parameters.
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## [1] "+proj=longlat +datum=WGS84 +no_defs"
Seems identical (WGS84).
If there is a need to unify the projections, one can simply copy the one used in the raster file into the map. The st_transform() function can be used for this:
To apply spatial limitation of the raster data set one has to use a function crop(), in which the second argument is the extent of the area. It must be an Extent class object, resulting from the use of the extent() function.
Lets see the extent of our raster data and for the map:
## class : Extent
## xmin : -180.0042
## xmax : 180.0042
## ymin : -65.00417
## ymax : 75.00417
## class : Extent
## xmin : -10.47241
## xmax : 44.82055
## ymin : 34.5634
## ymax : 71.15304
We crop the raster data to the rectangle, where the European Union is located.
Let’s see the result.
## png
## 2
One can clearly see European continent and major cities. Lets super-impose the map of EU regions.
## png
## 2
This data still contains areas outside the EU. Raster data for pixels outside the polygons can be reemoved (set to missing) with the mask() function.
The more polygons (regions), the more time it takes (here ca. 40 seconds).
Let’s display data truncated to the EU area imposing once again the grid of countries borders.
## png
## 2
Data have been correctly restricted to the European Union area.
Very efficient identification and aggregation of data for selected areas can be done by using the cellnumbers() function from the tabularaster package by Michael D. Sumner. It allows to identify numbers (indexes) of cells located in every region (polygon in the map object). And aggregation based on indexed data is ca. 500 times faster than with using purely the extract() function.
The only problem here is that some of the countries have several islands (the geometry of type MULTIPOLYGON) which is not YET SUPPORTED by cellnumbers().
So for this simple example lets convert all geometries to just POLYGONs (inc case of a MULTIPOLYGON only the first will be used).
And finally the aggregation.
# identify numbers (indexes) of cells located
# in every region (polygon in the map object)
cell <- cellnumbers(data_raster_EU,
map_nuts0)
# which creates a data.frame with
# two columns: object_ and cell_
# and then aggregate the values
# for all cell_(s) by object_
cell %>%
mutate(light = raster::extract(data_raster_EU,
cell$cell_)) %>%
group_by(object_) %>%
summarise(lights = sum(light, na.rm = TRUE)) -> lights_NUTS0Lets add a new column to the map data frame.
and plot it on a map
ggplot(map_nuts0) +
geom_sf(aes(fill = lights2013)) +
coord_sf() +
scale_fill_gradient(low = "black", high = "yellow") +
guides(fill = FALSE) +
theme_classic()Lets in addtion import data on GDP and population from Eurostat.
## Parsed with column specification:
## cols(
## NUTS_ID = col_character(),
## population2013 = col_double(),
## gdpcap2013 = col_double(),
## gdp2013 = col_double()
## )
To merge them correctly I convert all values to character.
map_nuts0$NUTS_ID <- as.character(map_nuts0$NUTS_ID)
eurostat$NUTS_ID <- as.character(eurostat$NUTS_ID)
map_nuts0_2 <-
map_nuts0 %>%
left_join(eurostat, by = "NUTS_ID")Lets check correlations between lights, log(lights) and GDP, GDP per capita and population.
map_nuts0_2 %>%
mutate(l_lights2013 = log(lights2013)) %>%
dplyr::select(population2013, gdp2013, gdpcap2013, lights2013, l_lights2013) %>%
st_drop_geometry() %>%
cor(use = "pairwise.complete.obs") ## population2013 gdp2013 gdpcap2013 lights2013
## population2013 1.0000000 0.88636395 -0.114257199 0.877584427
## gdp2013 0.8863639 1.00000000 0.068928187 0.876565845
## gdpcap2013 -0.1142572 0.06892819 1.000000000 -0.002230403
## lights2013 0.8775844 0.87656584 -0.002230403 1.000000000
## l_lights2013 0.6720392 0.62781102 -0.238638955 0.775179193
## l_lights2013
## population2013 0.6720392
## gdp2013 0.6278110
## gdpcap2013 -0.2386390
## lights2013 0.7751792
## l_lights2013 1.0000000
Scatterplot between GDP and lights.
ggplot(map_nuts0_2,
aes(x = gdp2013/1e+06,
y = lights2013/1e+06)) +
xlab("GDP in 2013") +
ylab("NTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_nuts0_2 %>%
filter(gdp2013/1e+06 > 1 | lights2013/1e+06 > 3),
aes(label = NUTS_ID),
size = 5) +
geom_text(x = 2, y = 3,
size = 8,
label = "cor(GDP, NTLI) = 0.88") +
geom_text(x = 2, y = 2,
size = 8,
label = "cor(GDP, lnNTLI) = 0.63") +
# title
ggtitle("Night-time lights intensity vs GDP in 2013 in EU countries") +
theme_bw()Scatterplot between population and lights.
ggplot(map_nuts0_2,
aes(x = population2013/1e+04,
y = lights2013/1e+06)) +
xlab("population in 2013") +
ylab("NTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_nuts0_2 %>%
filter(population2013/1e+04 > 3 |
lights2013/1e+06 > 3),
aes(label = NUTS_ID),
size = 5) +
geom_text(x = 5.5, y = 3,
size = 8,
label = "cor(pop, NTLI) = 0.88") +
geom_text(x = 5.5, y = 2,
size = 8,
label = "cor(pop, lnNTLI) = 0.67") +
ggtitle("Night-time lights intensity vs population in 2013 in EU countries") +
theme_bw()Import shapefile for European NUTS2 regions and limit its area to the mainland of Europe (cut the islands as before).
## Reading layer `NUTS_RG_10M_2013_4326_LEVL_2' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 320 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)): Evaluation error: TopologyException: Input geom 0 is invalid: Self-intersection at or near point 13.167791844340858 66.658418257526989 at 13.167791844340858 66.658418257526989.
To be able to limit the raster data to the desired area one has to make sure that should check if all geometries in the sf object are valid. Intuitively the two-dimensional geometry (POLYGON) is valid, if its boundaries are created from subsequent connecting, but non-overlapping and non-intersecting sections. MULTIPOLYGON geometry is valid if it consists of valid POLYGONs.
Checking the validity of the geometries of the analyzed objects can be time-consuming (especially for very complex geometries), but it saves problems in their subsequent visualization or analysis.
##
## FALSE TRUE
## 9 311
Nine regions have incorrect geometries. They might me automatically corrected with st_make_valid() function from lwgeom package, which is the extension of sf. Its description can be found here: github.com/r-spatial/lwgeom.
And apply the crop() function once again.
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
Now it works fine. Next we adjust the coordinates.
And plot the map of EU NUTS2 regions superimposed on night-time lights distribution.
## png
## 2
For the use of cellnumbers() we need to simplify complex geometries to just POLYGONs.
And apply aggregation.
cell <- cellnumbers(data_raster_EU,
map_nuts2)
cell %>%
mutate(light = raster::extract(data_raster_EU,
cell$cell_)) %>%
group_by(object_) %>%
summarise(lights = sum(light, na.rm = TRUE)) -> lights_nuts2Lets add a new column to the map data frame.
## Error in `[[<-.data.frame`(`*tmp*`, i, value = c(72969, 361161, 207187, : replacement has 310 rows, data has 311
Lights data is missing for one region - which?
## [1] 294
## Simple feature collection with 1 feature and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 11.92691 ymin: 59.79043 xmax: 11.92696 ymax: 59.79046
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## NUTS_ID LEVL_CODE CNTR_CODE NUTS_NAME FID
## 303 NO01 2 NO Oslo og Akershus NO01
## geometry
## 303 POLYGON ((11.92691 59.79046...
This region has a very strange shape – for simplicity, let’s assume it has the intensity of night lights equal to 0.
lights_nuts2 <- lights_nuts2 %>%
rbind(., data.frame(object_ = which_missing,
lights = 0)) %>%
arrange(object_)Add a new column to the map data frame.
And finally plot it on a map
ggplot(map_nuts2) +
geom_sf(aes(fill = lights2013)) +
coord_sf() +
scale_fill_gradient(low = "black", high = "yellow") +
guides(fill = FALSE) +
theme_classic()## Saving 10 x 5 in image
Lets in addtion import data on GDP and population for regions directly from Eurostat - with the use of eurostat package – see package tutorial.
To download data, you need to know their codes used by Eurostat.
This can be checked on the Eurostat website or in particular reports, for example GDP at regional level, Population statistics at regional level.
You can also check them by searching the Eurostat database by keywords:
## Registered S3 method overwritten by 'cli':
## method from
## print.boxx spatstat
## # A tibble: 6 x 3
## title code type
## <chr> <chr> <chr>
## 1 GDP and main components (output, expenditure and inc~ nama_10_gdp datas~
## 2 GDP and main components (output, expenditure and in~ namq_10_gdp datas~
## 3 Gross domestic product (GDP) at current market price~ nama_10r_2g~ datas~
## 4 Average annual population to calculate regional GDP ~ nama_10r_3p~ datas~
## 5 Gross domestic product (GDP) at current market price~ nama_10r_3g~ datas~
## 6 European Union trade mark (EUTM) applications per bi~ ipr_ta_gdpr datas~
Let’s download the GDP figures in the NUTS2 regions (nama_10r_2gdp).
## Table nama_10r_2gdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_2gdp_num_code_TF.rds
Similarly for population figures.
## Table nama_10r_3popgdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds
We need to add this data to the sf object with geometries - all datasets use common Eurostat codes for regions, but columns have different names.
map_nuts2 <- map_nuts2 %>%
left_join(gdp %>% dplyr::select(geo, values),
by = c("NUTS_ID" = "geo")) %>%
dplyr::rename(gdp2013 = values) %>%
left_join(pop %>% dplyr::select(geo, values),
by = c("NUTS_ID" = "geo")) %>%
dplyr::rename(pop2013 = values)## Warning: Column `NUTS_ID`/`geo` joining factors with different levels,
## coercing to character vector
## Warning: Column `NUTS_ID`/`geo` joining character vector and factor,
## coercing into character vector
Finally we check for correlations.
map_nuts2 %>%
mutate(l_lights2013 = log(lights2013 + 1)) %>%
# bo był jeden region z zerem
dplyr::select(ends_with("2013")) %>%
st_drop_geometry() %>%
cor(use = "pairwise.complete.obs")## lights2013 gdp2013 pop2013 l_lights2013
## lights2013 1.0000000 0.3516237 0.4219286 0.6588570
## gdp2013 0.3516237 1.0000000 0.8744452 0.2271030
## pop2013 0.4219286 0.8744452 1.0000000 0.3542235
## l_lights2013 0.6588570 0.2271030 0.3542235 1.0000000
And make a scatterplot between GDP and lights
ggplot(map_nuts2,
aes(x = gdp2013/100000,
y = lights2013/100000)) +
geom_point(size = 5,
colour = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
xlab("NTLI in 2013") +
geom_text(x = 2.5, y = 17,
size = 8,
label = "cor(GDP, NTLI) = 0.35") +
geom_text(x = 2.5, y = 15,
size = 8,
label = "cor(GDP, lnNTLI) = 0.23") +
geom_label_repel(data = map_nuts2 %>%
filter(gdp2013/100000 > 2 | lights2013/100000 > 10),
aes(label = NUTS_NAME),
size = 5) +
ggtitle("Night-time lights intensity vs GDP in 2013 in EU regions (NUTS 2)") +
theme_bw()## Warning: Removed 64 rows containing non-finite values (stat_smooth).
## Warning: Removed 64 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).
Scatterplot between ligts and population in NUTS2 regions.
ggplot(map_nuts2,
aes(x = pop2013/1000,
y = lights2013/100000)) +
geom_point(size = 5,
colour = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
xlab("population in 2013") + ylab("lNTLI in 2013") +
geom_text(x = 10, y = 17,
size = 8,
label = "cor(pop, NTLI) = 0.42") +
geom_text(x = 10, y = 15,
size = 8,
label = "cor(pop, lnNTLI) = 0.66") +
geom_label_repel(data = map_nuts2 %>%
filter(pop2013/1000 > 7 |
lights2013/100000 > 10),
aes(label = NUTS_NAME),
size = 5) +
ggtitle("Night-time lights intensity vs population in 2013 in EU regions (NUTS 2)") +
theme_bw()## Warning: Removed 54 rows containing non-finite values (stat_smooth).
## Warning: Removed 54 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).
## Reading layer `NUTS_RG_10M_2013_4326_LEVL_3' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\NUTS_RG_10M_2013_4326_LEVL_3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1480 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -63.08826 ymin: -21.38731 xmax: 55.83663 ymax: 71.17354
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
Map of EU NUTS3 regions imposed on night-time lights intensity.
## png
## 2
map_nuts3 <- st_cast(map_nuts3, "POLYGON")
cell <- cellnumbers(data_raster_EU,
map_nuts3)
cell %>%
mutate(light = raster::extract(data_raster_EU,
cell$cell_)) %>%
group_by(object_) %>%
summarise(lights = sum(light,
na.rm = TRUE)) -> lights_nuts3
map_nuts3$lights2013 <- lights_nuts3$lightsVisualization on a map
ggplot(map_nuts3) +
geom_sf(aes(fill = lights2013)) +
coord_sf() +
scale_fill_gradient(low = "black", high = "yellow") +
guides(fill = FALSE) +
theme_classic()## Saving 10 x 5 in image
## Table nama_10r_3gdp cached at C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3gdp_num_code_TF.rds
gdp <- gdp %>%
filter(time == 2013,
unit == "MIO_PPS")
pop <- get_eurostat("nama_10r_3popgdp",
time_format = "num")## Reading cache file C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds
## Table nama_10r_3popgdp read from cache file: C:\Users\Piotr\AppData\Local\Temp\Rtmp61gJRc/eurostat/nama_10r_3popgdp_num_code_TF.rds
Merging the dada with the sf object.
map_nuts3 <- map_nuts3 %>%
left_join(gdp %>% dplyr::select(geo, values),
by = c("NUTS_ID" = "geo")) %>%
dplyr::rename(gdp2013 = values) %>%
left_join(pop %>% dplyr::select(geo, values),
by = c("NUTS_ID" = "geo")) %>%
dplyr::rename(pop2013 = values)## Warning: Column `NUTS_ID`/`geo` joining factors with different levels,
## coercing to character vector
## Warning: Column `NUTS_ID`/`geo` joining character vector and factor,
## coercing into character vector
And checking correlations.
map_nuts3 %>%
mutate(l_lights2013 = log(lights2013 + 1)) %>%
# bo był jeden region z zerem
dplyr::select(ends_with("2013")) %>%
st_drop_geometry() %>%
cor(use = "pairwise.complete.obs")## lights2013 gdp2013 pop2013 l_lights2013
## lights2013 1.0000000 0.4205909 0.5163390 0.8188150
## gdp2013 0.4205909 1.0000000 0.9006870 0.3250995
## pop2013 0.5163390 0.9006870 1.0000000 0.4703375
## l_lights2013 0.8188150 0.3250995 0.4703375 1.0000000
Scatterplot between lights and GDP
ggplot(map_nuts3,
aes(x = gdp2013/10000,
y = lights2013/100000)) +
# points
geom_point(size = 5,
colour = "darkblue") +
# regression line added
geom_smooth(method = "lm", se = FALSE) +
xlab("GDP in 2013") + ylab("lNTLI in 2013") +
geom_text(x = 17, y = 6,
size = 8,
label = "cor(GDP, NTLI) = 0.36") +
geom_text(x = 17, y = 5.5,
size = 8,
label = "cor(GDP, lnNTLI) = 0.28") +
geom_label_repel(data = map_nuts3 %>%
filter(gdp2013/10000 > 10 | lights2013/100000 > 4),
aes(label = NUTS_NAME),
size = 5) +
# title
ggtitle("Night-time lights intensity vs GDP in 2013 in EU regions (NUTS 3)") +
theme_bw()## Warning: Removed 363 rows containing non-finite values (stat_smooth).
## Warning: Removed 363 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_label_repel).
Scatterplot between population and nighttime lights intensity.
ggplot(map_nuts3,
aes(x = pop2013/1000,
y = lights2013/100000)) +
geom_point(size = 5,
colour = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
xlab("population in 2013") + ylab("lNTLI in 2013") +
geom_text(x = 4, y = 6,
size = 8,
label = "cor(pop, NTLI) = 0.51") +
geom_text(x = 4, y = 5.5,
size = 8,
label = "cor(pop, lnNTLI) = 0.46") +
geom_label_repel(data = map_nuts3 %>%
filter(pop2013/1000 > 2 | lights2013/100000 > 3.8),
aes(label = NUTS_NAME),
size = 5) +
ggtitle("Night-time lights intensity vs population in 2013 in EU regions (NUTS 3)") +
theme_bw()## Reading layer `tl_2018_us_state' from data source `C:\Users\Piotr\Dropbox\__biezace\2019-07-09_UseR_Toulouse\_prez\files\tl_2018_us_state.shp' using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
Adjusting coordinates projections between shapefile and raster data.
Checking validity.
##
## TRUE
## 56
Limiting the bounding box.
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
Then raster image is cropped to the are of US.
## png
## 2
One can clearly identify major US cities.
Masking the data for areas outside USA.
And display data truncated to the US area imposing once again the grid of states borders.
## png
## 2
map_US2 <- st_cast(map_US2, "POLYGON")
cell <- cellnumbers(data_raster_US,
map_US2)
cell %>%
mutate(light = raster::extract(data_raster_US,
cell$cell_)) %>%
group_by(object_) %>%
summarise(lights = sum(light,
na.rm = TRUE)) -> lights_US
map_US2$lights2013 <- lights_US$lightsResult on a map.
ggplot(map_US2) +
geom_sf(aes(fill = lights2013)) +
coord_sf() +
scale_fill_gradient(low = "black", high = "yellow") +
guides(fill = FALSE) +
theme_classic()Importing data about state GDP and population.
## Parsed with column specification:
## cols(
## GeoFips = col_character(),
## code = col_character(),
## GeoName = col_character(),
## gdp2013 = col_double(),
## gdp2016 = col_double(),
## pop2013 = col_double(),
## pop2016 = col_double()
## )
To merge the data correctly let’s convert all state codes in a map file to character values.
map_US2$code <- as.character(map_US2$STUSPS)
map_US2_2 <-
map_US2 %>%
left_join(census, by = "code")Correlation matrix
map_US2_2 %>%
mutate(l_lights2013 = log(lights2013)) %>%
dplyr::select(lights2013, l_lights2013, gdp2013, pop2013) %>%
st_drop_geometry() %>%
cor(use = "pairwise.complete.obs")## lights2013 l_lights2013 gdp2013 pop2013
## lights2013 1.0000000 0.8163420 0.7847893 0.8331715
## l_lights2013 0.8163420 1.0000000 0.5669594 0.6269202
## gdp2013 0.7847893 0.5669594 1.0000000 0.9845955
## pop2013 0.8331715 0.6269202 0.9845955 1.0000000
Scatterplot between GDP and lights in US states.
ggplot(map_US2_2,
aes(x = gdp2013/1e+5,
y = lights2013/1e+5)) +
xlab("GDP in 2013") +
ylab("lNTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_US2_2 %>%
filter(gdp2013/1e+5 > 10),
aes(label = code),
size = 5) +
geom_text(x = 15, y = 15,
size = 8,
label = "cor(GDP, NTLI) = 0.78") +
geom_text(x = 15, y = 10,
size = 8,
label = "cor(GDP, lnNTLI) = 0.82") +
ggtitle("Night-time lights intensity vs GDP in 2013 in US states") +
theme_bw()## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Scatterplot between population and lights in US states.
ggplot(map_US2_2,
aes(x = pop2013/1e+5,
y = lights2013/1e+5)) +
xlab("population in 2013") +
ylab("lNTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_US2_2 %>%
filter(pop2013/1e+5 > 150),
aes(label = code),
size = 5) +
geom_text(x = 250, y = 15,
size = 8,
label = "cor(pop, NTLI) = 0.83") +
geom_text(x = 250, y = 10,
size = 8,
label = "cor(pop, lnNTLI) = 0.63") +
ggtitle("Night-time lights intensity vs population in 2013 in US states") +
theme_bw()## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
In case of world counties lets use wrld_simpl object from the maptools package converted to sf object.
Lets adjust the coordinates projections
## [1] "+proj=longlat +datum=WGS84 +no_defs"
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
and check validity of geometries.
##
## FALSE TRUE
## 4 242
Let’s also convert the type of geometries to POLYGONs.
## Warning in st_cast.sf(map_world, "POLYGON"): repeating attributes for all
## sub-geometries for which they may not be constant
It appears that the extent of the map is larger than the extent of the raster data.
## class : Extent
## xmin : -180
## xmax : 180
## ymin : -90
## ymax : 83.57027
## class : Extent
## xmin : -180.0042
## xmax : 180.0042
## ymin : -65.00417
## ymax : 75.00417
Let’s limit the map to the extent of raster object.
my_bbox = c(xmin = ext_r@xmin,
ymin = ext_r@ymin,
xmax = ext_r@xmax,
ymax = ext_r@ymax)
map_world2 <- st_crop(map_world2,
my_bbox)## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant
## throughout all geometries
As operating on the whole raster data may be too demanding (in this case R will try to read all data into memory), we will make the aggregation using a loop and sum night-time lights intensity for every country separately (in fact every POLYGON as we have just converted original data into POLYGONs).
CAUTION! This loop run for about 45 minutes on my laptop. Let me know if you find a more (time) efficient way :)
Let’s collect the calculated values in a numeric vector.
lights_world <- array(NA, nrow(map_world2))
for (i in 1:nrow(map_world2)) {
message(i)
map_i <- map_world2[i,]
data_raster_i <- crop(data_raster,
extent(map_i))
cell <- cellnumbers(data_raster_i,
map_i)
if (nrow(cell) > 0) {
cell %>%
mutate(light = raster::extract(data_raster_i,
cell$cell_)) %>%
group_by(object_) %>%
summarise(lights = sum(light, na.rm = TRUE)) -> lights_i
lights_world[i] <- as.numeric(lights_i$lights)
} else
lights_world[i] <- 0
rm(map_i, data_raster_i, cell, lights_i)
gc()
}Let’s add the values to the map data and finally aggregate over POLYGONs within countries.
map_world2$lights2013 <- lights_world
lights_world_counties <- map_world2 %>%
group_by(FIPS, ISO2, ISO3, UN, NAME) %>%
summarise(lights2013 = sum(lights2013,
na.rm = TRUE)) %>%
ungroup()Finally NTLI data by countries are added to the original map data.
map_world <- map_world %>%
left_join(st_drop_geometry(lights_world_counties) %>%
dplyr::select(ISO2, lights2013))## Joining, by = "ISO2"
Data about night-time lights intensity will be correlated with GDP and population, similarly as in previous examples. For countries of the world such data can be imported in R directrly from the World Development Indicators database with the WDI package.
To request the indicator GDP (Current US$) we need to use its code NY.GDP.MKTP.CD.
One can also search for the code with the WDIsearch() function.
## indicator
## [1,] "BX.TRF.PWKR.GD.ZS"
## [2,] "BX.TRF.PWKR.DT.GD.ZS"
## [3,] "BX.TRF.MGR.DT.GD.ZS"
## [4,] "BX.KLT.DINV.WD.GD.ZS"
## [5,] "BX.KLT.DINV.DT.GD.ZS"
## [6,] "BX.GSR.MRCH.ZS"
## [7,] "6.0.GDPpc_constant"
## [8,] "6.0.GDP_usd"
## [9,] "6.0.GDP_growth"
## [10,] "6.0.GDP_current"
## name
## [1,] "Workers' remittances, receipts (% of GDP)"
## [2,] "Personal remittances, received (% of GDP)"
## [3,] "Migrant remittance inflows (% of GDP)"
## [4,] "Foreign direct investment, net inflows (% of GDP)"
## [5,] "Foreign direct investment, net inflows (% of GDP)"
## [6,] "Merchandise exports (BOP): percentage of GDP (%)"
## [7,] "GDP per capita, PPP (constant 2011 international $) "
## [8,] "GDP (constant 2005 $)"
## [9,] "GDP growth (annual %)"
## [10,] "GDP (current $)"
We also need data on population:
Let’s put both data together.
## iso2c country year NY.GDP.MKTP.CD
## 1 1A Arab World 2013 2.867265e+12
## 2 1W World 2013 7.718995e+13
## 3 4E East Asia & Pacific (excluding high income) 2013 1.182778e+13
## 4 7E Europe & Central Asia (excluding high income) 2013 4.314982e+12
## 5 8S South Asia 2013 2.357175e+12
## 6 AD Andorra 2013 3.281585e+09
## SP.POP.TOTL
## 1 379705719
## 2 7170961674
## 3 2008932012
## 4 405778196
## 5 1705772050
## 6 80774
There is a column called iso2c which includes the same country codes as ISO2 in the map file, so data can be easily put together with the map object. Lets give informative names to the last two columns.
map_world <- map_world %>%
left_join(world_data, by = c("ISO2" = "iso2c")) %>%
rename(pop2013 = SP.POP.TOTL,
gdp2013 = NY.GDP.MKTP.CD)## Warning: Column `ISO2`/`iso2c` joining factor and character vector,
## coercing into character vector
In the last step we will calculate the correlation matrix.
map_world %>%
mutate(l_lights2013 = log(1+lights2013)) %>%
dplyr::select(lights2013, l_lights2013, gdp2013, pop2013) %>%
st_drop_geometry() %>%
cor(use = "pairwise.complete.obs")## lights2013 l_lights2013 gdp2013 pop2013
## lights2013 1.0000000 0.3891884 0.9150627 0.5713779
## l_lights2013 0.3891884 1.0000000 0.3923265 0.3312246
## gdp2013 0.9150627 0.3923265 1.0000000 0.5338668
## pop2013 0.5713779 0.3312246 0.5338668 1.0000000
and present relationships graphically.
ggplot(map_world,
aes(x = gdp2013/1e+12,
y = lights2013/1e+7)) +
xlab("GDP in 2013") +
ylab("NTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_world %>%
filter(gdp2013/1e+12 > 3 |
lights2013/1e+7 > 1.5),
aes(label = country),
size = 5) +
geom_text(x = 5, y = 6.5,
size = 8,
label = "cor(GDP, NTLI) = 0.91") +
geom_text(x = 5, y = 6,
size = 8,
label = "cor(GDP, lnNTLI) = 0.39") +
ggtitle("Night-time lights intensity vs GDP in 2013 in world countries") +
theme_bw()## Warning: Removed 43 rows containing non-finite values (stat_smooth).
## Warning: Removed 43 rows containing missing values (geom_point).
And the same for population vs night-time lights.
ggplot(map_world,
aes(x = pop2013/1e+8,
y = lights2013/1e+7)) +
xlab("population in 2013") +
ylab("NTLI in 2013") +
geom_point(size = 5,
col = "darkblue") +
geom_smooth(method = "lm", se = FALSE) +
geom_label_repel(data = map_world %>%
filter(pop2013/1e+8 > 2.5 |
lights2013/1e+7 > 1.5),
aes(label = country),
size = 5) +
geom_text(x = 8, y = 6.5,
size = 8,
label = "cor(pop, NTLI) = 0.57") +
geom_text(x = 8, y = 6,
size = 8,
label = "cor(pop, lnNTLI) = 0.33") +
ggtitle("Night-time lights intensity vs population in 2013 in world countries") +
theme_bw()## Warning: Removed 35 rows containing non-finite values (stat_smooth).
## Warning: Removed 35 rows containing missing values (geom_point).